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ABSTRACT 


The  theory  of  optimum  least-squares  multichannel  filters 
is  developed  from  the  definitions  of  correlation  functions 
and  power  spectra.  Four  types  of  these  optimum  processors 
were  applied  to  an  Aleutian  earthquake  recorded  by  part  of 
the  Montana  LASA  and  the  results  evaluated.  It  appears  that 
certain  isotropic  processors  can  give  a  modest  signal-to- 
noise  ratio  improvement  in  excess  cf  1  db  over  the  prefilter¬ 
ed  straight  sum.  Evaluation  of  the  outputs  and  transfer 
functions  of  these  filters  gives  some  insight  into  methods 
for  improving  their  gain  and  stability.  Several  other  topics 
are  discussed  from  a  theoretical  point  of  view;  these  in¬ 
clude  estimation  of  power  spectral  density  functions,  frequency- 
wavenumber  spectra,  and  certain  coherence  functions. 


INTRODUCTION 


This  report  concludes  the  first  phase  of  the  SDL  multi¬ 
channel  filter  project;  namely,  the  design  and  evaluation 
of  certain  multichannel  filters.  It  has  two  parts:  (1)  the 
theoretical  specification  and  formal  solution  of  the  digital 
filtering  problem;  and  (2)  the  experimental  results  obtained 
by  applying  the  various  multichannel  filters  to  actual  seismic 
array  data.  On  this  basis  certain  conclusions  have  been  drawn 
about  the  performance  of  the  various  multichannel  filters. 


The  practical  computation  of  many  of  these  filter  oper¬ 
ators  relies  heavily  on  the  fast  Fourier  transform  algorithm 
of  Cooley  and  Tukey,  without  which  many  of  the  calculations 
reported  here  would  be  too  expensive  to  be  practical  and  some 
would  not  even  be  possible.  An  earlier  report  (McCowan,  1966) 
dealt  with  the  transform  method  and  its  associated  techniques 
and  the  present  report  can  be  regarded  as  a  logical  develop¬ 
ment  of  the  ideas  presented  there. 


Throughout  this  report  the  standard  of  comparison  used 
to  grade  multichannel  filter  performance  is  the  unbeamsteered 
sum  of  the  data  channels.  This  was  chosen  instead  of  the 
beamsteered  or  phased  sum  simply  as  a  matter  of  convenience. 
Straight  summation  is  itself  an  optimum  processor  when  the 
signals  are  aligned  and  the  noise  is  uncorrelated.  Any  de¬ 
partures  from  this  model  either  in  signal  misalignment  or 
noise  correlation,  will,  of  course,  detract  from  the  theo¬ 
retically  best  performance,  which  is  the  square  root  of  the 
number  of  sensors  increase  in  signal  to  noise  (S/N)  ratio 
improvement  measured  in  decibels  (db).  This  is  where  the 
multichannel  filter  is  applicable;  the  design  equations  for 
which  are  a  theoretical  statement  of  these  departures  in  signal 
and  noise  models.  Filters  satisfying  these  equations  should 
provide  an  increase  in  S/N  ratio  improvement  over  the  other¬ 
wise  optimum  processor,  the  straight  sum. 
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The  data  used  in  the  experimental  part  of  this  report 
were  from  an  Aleutian  earthquake  recorded  on  ten  sensors  in 
the  D1  subarray  of  the  Montana  LASA.  The  pertinent  facts 
about  the  event  are  listed  in  Table  1.  Ten  sensors  were 
used  instead  of  the  original  twenty-five  available  in  the 
D1  subarray  to  reduce  the  computation  time  and  to  simplify 
the  procedure  of  tabulating  results.  Calibrations  were  com¬ 
puted  in  the  usual  manner  and  all  traces  were  corrected  to 
true  ground  motion.  Table  2  lists  the  coordinates  and  cali¬ 
brations  (demagnif ications )  for  the  set  of  ten  sensors.  This 
particular  set  was  selected  to  provide  a  reasonably  geometric 
array  with  good  element  separation.  Figure  1  is  a  map  of  the 
reduced  array. 

For  some  parts  of  the  study,  the  traces  were  prefiltered 
with  the  "SDL"  bandpass  filter.  This  filter  is  a  phaseless 
four-pole  Butterworth  recursive  filter.  It  was  chosen  to 
provide  a  balance  between  excessive  signal  distortion  at  one 
end  and  good  noise  reduction  at  the  other.  The  frequency  re¬ 
sponse  of  this  filter  is  shown  in  Figure  2. 

As  in  previous  SDL  data-processmg  reports,  the  definit¬ 
ion  of  signal  amplitude  will  be  one-half  the  maximum  peak-to- 
trough  amplitude  occurring  in  the  first  three  cycles  of  the 
P  phase.  Noise  is  defined  as  the  root-mean-square  amplitude 
measured  over  some  specified  interval.  The  S/N  ratio  is  the 
quotient  of  these  two  quantities.  Frequently  results  will 
be  expressed  in  db  for  which  the  usual  expression  will  be 
used,  20  log1Q (RATIO ) .  When  the  quantities  to  be  expressed 
in  terms  of  db  have  the  dimensions  of  power,  this  definition 
has  been  changed  to  10  loglQ (RATIO)  in  order  to  preserve  the 
proper  amplitude  ratio. 

Finally,  much  information  in  this  report  is  presented 
in  contour  plots.  These  plots  are  drawn  by  the  computer  on 
the  printer.  Accuracy  and  resolution  in  these  plots  are 
balanced  against  the  time  necessary  to  compute  and  print 
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them.  As  a  result,  sometimes  the  fine  detail  predicted  by 
the  theory  is  lost,  Nevertheless,  printer  contour  plots 
have  been  used  unretouched  because  they  are  an  efficient 
representation  of  large  amounts  of  information.  For  each 
plot  we  computed  the  function  to  be  plotted  only  at  a  grid 
of  21  x  21  evenly  spaced  points  in  the  horizontal  and  verti¬ 
cal  directions,  and  fitted  in  a  grid  of  60  x  10 (J  points  by 
bilinear  interpolation. 

A.  MEASUREMENT  OF  SEISMIC  NOISE 


The  design  of  digital  filters  and  the  analysis  of  their 
inputs  and  outputs  require  the  estimation  of  such  quantities 
as  correlation  functions,  power  spectral  density  functions, 
frequency-wavenumber  spectra  and  various  coherence  functions. 
These  measurements  are  the  connections  between  theoretical 
derivations  and  actual  practice.  In  this  section  we  will 
describe  the  methods  we  have  selected  for  computing  these 
functions  as  well  as  any  ccnsequences  these  methods  may  imply 
on  the  accuracy  or  form  of  the  results. 


1.  Correlation  Functions.  In  the  continuous  case  a  general 
correlation  function  is  defined  by  an  integral  similar  to  a 
convolution: 


(t) 


1 

T 


rT 

j  xi(T> 


Xj(t+t)  dt 


(1) 


When  i= j ,  R  is  an  autocorrelation  and  when  ii j ,  it  is  a  cross¬ 
correlation.  In  the  discrete  case  the  integral  becomes  a  sun: 


Ri- (t)  =  l  X. (t)  X. (t+t)  (2) 

where  t  and  t  are  now  indices.  In  the  work  that  follows  they 
will  still  be  written  as  arguments  to  avoid  confusion. 

It  is  usually  assumed  that  the  time  series  under  consider¬ 
ation  are  zero  outside  the  region  of  interest.  This  has  the 
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result  of  bounding  the  correlation  function  and  ensuring  that 
it  tends  towards  zero  as  100%  lags,  the  number  of  lags  equal 
to  the  number  of  points  in  the  time  series,  are  taken.  This 
condition  can  be  written  explicitly  by  limiting  the  sum  in 
equation  2: 

T-t-1 

Rin(t)  =  I  X.(T)  X.Ct+x)  (3) 

J  T=0  1  3 

Negative  lags  are  taken  care  of  by  use  of  the  identity: 

Rij(t)  =  (4) 

One  further  generalization  is  necessary  to  take  into 
account  stationarity .  If  the  time  series  under  consideration 
are  stationary  in  time  then  it  does  not  matter  what  sample  is 
used  (provided  it  if  long  enough  to  ensure  stability)  to  es¬ 
timate  the  correlation  functions.  That  is: 

T-t+l+s 

Rij<t)  =  l  Xi(t)  X.(t+x)  (5) 

T  =  S  J 

for  all  s . This  will  be  needed  later  in  the  least-squares 
derivation. 

2.  Power  Spectral  Density  Functions.  Prior  to  the  develop¬ 
ment  of  the  fast  Fourier  transform  method,  power  spectra  were 
invaribly  computed  by  transforming  the  respective  correlation 
function.  This  method  is  usually  referred  to  as  the  Blackman- 
Tukey  (B-T)  method  of  spectral  estimation  (Blackman  and  Tukey, 
1959).  However,  since  computing  Fourier  transforms  of  long 
pieces  of  data  directly  is  now  economical  in  terms  of  machine 
time  and  storage  space,  a  new  method  has  come  into  vogue. 

We  have  chosen  to  compute  the  spectra  used  in  this  project 
by  direct  multiplication  of  Fourier  transforms.  This  pro¬ 
cedure  is  now  referred  to  as  the  Cooley-Tukey  (C-T)  method. 
(Cooley  and  Tukey,  1 9 G 5 ;  Cooley  et.  al.,  1967;  McCowan,  1966) 


The  starting  point  in  the  P -T  method  is  equation  (2). 
However,  because  the  computatic  of  correlation  functions  is 
costly  in  terms  of  machine  time,  those  generally  used  are 
not  100%  lag  correlation  functions.  The  effect  of  their 
truncation  is  to  introduce  the  spectral  window.  Multipli¬ 
cation  by  a  "boxcar"  function  in  time  is  equivalent  to  con¬ 
volution  by  its  transform,  the  sin  x  function,  in  frequency. 
Thus  the  result  of  the  B-T  methodxis  the  100%  lag  power 
spectra  smoothed  by  this  spectral  window.  The  more  lags 
that  are  used,  the  narrower  is  the  spectral  window.  There¬ 
fore  the  correlation  functions  are  computed  out  only  as  far 
as  required  to  give  a  balance  between  stability  and  accuracy 
on  the  one  hand  and  economy  in  computing  time  on  the  other. 

The  C-T  method  and  the  necessary  smoothing  that  is 
chosen  produce  a  different  set  of  constraints  and  criteria 
to  be  optimized.  The  method  is  superior  only  when  the  length 
of  the  time  series  is  a  highly  composite  number.  In  parti¬ 
cular  our  program  works  only  when  the  number  of  points  in  the 
time  series  is  a  power  of  two.  When  this  is  so  the  C-T 
method  produces  a  N+l  point  raw  power  spectral  density  function, 
where  N  =  2  is  the  length  of  the  data.  Since  N  is  of  the 
order  of  4096  or  2048,  this  represents  a  large  amount  of  infor¬ 
mation.  In  fact  these  N+l  points  are  the  actual  transform  of 
the  100%  lag  correlation  function.  Because  a  set  of  auto  and 
cross  spectra  for  13  or  21  channels  of  data  each  with  4097 
frequencies  between  dc  and  the  folding  frequency  greatly  ex¬ 
ceeds  the  capacity  of  our  machine,  it  is  reasonable  to  sac¬ 
rifice  some  of  these  degrees  of  freedom  to  smooth  and  to 
reduce  the  spectra  to  a  more  manageable  size.  Therefore  we 
have  chosen  to  apply  a  smoothing  and  decimation  procedure 

several  times  to  reduce  the  raw  spectra  to  something  like  65 
or  129  points  each. 

Our  procedure  consists  of  convolving  the  power  spectra 
with  the  three  point  operator,  (*,%,%),  and  then  taking  every 
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other  point  of  the  result.  To  see  what  this  does  to  the  100% 
lag  correlation  function  we  again  use  the  convolution  theorem. 
Convolution  by  in  frequency  is  equivalent  to  multi¬ 

plication  by  its  Fourier  transforn  in  time  which  is: 

2iri^c  -27rjk 

F(k)  =  h  +  hH  2N  +  hSL  2N  =  >5(1+cos2§)  =  cos2(^) 

(6) 

The  100%  lag  correlation  function,  which  includes  N-l  points 
on  either  side  of  the  zero  lag  value,  is  therefore  multiplied 
by  this  function  or  window.  Taking  every  other  point  of  the 
result  cuts  in  half  each  side  of  the  windowed  correlation 
function.  A  gcod  measure  of  how  much  error  is  introduced  by 
L  of  these  procedures  is  the  relative  energy  folded  back  into 
the  lower  lags  of  the  correlation  function. 


The  advantages  of  the  C-T  method  are  numerous;  but,  the 
two  major  features  are  its  inherent  speed  when  used  in  con¬ 
junction  .,Tith  the  fast  Fourier  transform  and  the  unrestricted 
choice  of  smoothing  it  permits.  For  reasonable  choices  of 
smooth:. ng  it  cannot  fail  to  be  more  accurate  because  weighted 
information  from  all  lags  of  the  correlation  function  appears 
in  the  spectra.  We  have  noticed  that  an  effect  of  this  is  to 
give  more  detail  to  the  spectra,  for  equivalent  degrees  of 
freedom,  while  still  following  closely  the  overall  behavior 
of  B-T  spectra. 

3.  Frequency-Wavenumber  Spectra.  The  spatial  information 
of  an  array  of  seismometers  can  be  used  in  conjunction  with 
their  frequency  spectra  to  yield  velocity  distribution 


information  about  the  data.  This  technique  consists  of  com' 
puting  frequency-wavenumber  spectra  and  then  viewing  the  re¬ 
sulting  plots  in  polar  coordinates.  The  distance  from  the 
origin  is  related  to  the  velocity  by: 


Our  convention  on  direction  is  that  the  arrow  pointing 
from  the  origin  to  the  noise  peak  on  the  diagram  is  opposite 
m  direction  to  the  propagation  vector  of  the  data.  In  other 
words,  the  f-k  plots  in  this  report  represent  the  direction 
from  which  the  data  is  coming  rather  than  the  direction  toward 
which  it  is  propagating. 

Just  as  regular  power  spectra  are  the  squares  of  a  time 
transform,  f-k  spectra  are  the  squares  of  a  space  transform. 

In  the  continuous  case  this  is: 


P(f,k) 


M 


dx  f(t,x) 


exp[-2iri(ft-k 


(9) 


where  for  convenience  we  have  written  the  set  of  data  as 


f (t ,x) . 

Expanding  the  square  gives: 

P(f,k)  =  J  dt1  j  dx1  f(t1,x1)  exp[2TTi(ft1-k  .  5^)] 

|  dt2  J  dx2  f(t2,x2)  exp[-2iri(ft2-k  •  x2>]  (10) 


Now  changing  to  new  variables  q 


P(f,k)  =  j  dq  j 


ds 


—  tg-ti  and  s  =  x2~x^ 

f(ti**i>  f^<l+ti,i+*1) 


exp[-27r i (fq-k  •  i)j  (11) 

Ihe  bracketed  quantity  is  called  the  time-space  correlation 
function.  Equation  (11)  states  that  an  f-k  spectrum  is  the 
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time-space  Fourier  transform  of  a  time-space  correlation 
function. 

However,  this  expression  simplifies  for  the  case  when 
the  data  f(t,x)  are  sampled  at  the  set  of  coordinates  {xi>. 
In  this  case  f(t,x)  can  be  written: 


f(t,x)  =  £  f  .  (t)  <5  ( x-x .  ) 

•  X 


Substituting  this  and  integrating  over  x,  gives: 


P(f,k)  = 

f  dq  f  dS  H 

f  dt,f.(t,)  f . (q+t1)6(xi-x.+s) 

J  J  J 

}  i  3 

exp[-2iri(fq-k  •  s)]  (13) 


Inserting  the  definition  of  the  correlation  functions,  equation 
(1),  and  integrating  over  s  gives: 


P(f,k)  =  I  l 
i  3 


dq  Rij  (q )  exp[-2iTifq  j  exp[+  2iTik- (x^x^  )  ] 

(14) 


The  bracketed  quantities  here  are  the  spectral  m^'-ix  elements, 
in  terms  of  which  the  final  result  can  be  written: 

P(f,k)  =  l  l  Si.(f)  exp[+2frik*  (xi-x.  )  ]  (15) 

i  ]  1] 

This  formula  is  also  applicable  to  the  calculation  of  filter 
f-k  responses  where  again  the  time-space  function  is  defined 
only  at  a  given  set  of  coordinates. 

Inherent  in  this  formalism  is  a  window.  Just  as  trans¬ 
forming  a  regular  correlation  function  gave  a  frequency  window, 
so  transforming  a  spatial  correlation  function  will  give  a 
k-space  window.  A  representation  of  this  window  is  the  array 
response  function  which  is  analogous  to  the  transfer  function 
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of  a  filter.  To  obtain  this  we  put  a  time  impulse  into  each 
channel  at  the  same  instant  of  time.  Since  the  spectral 
matrix  of  a  set  of  impulses  is  constant: 


S 


1 


the  array  response  is  given  by: 


(16) 


R(k)  =  l  l  exp[+27riJ<*  (x.-x.  )]  Q7) 

i  J  1  J 

Any  data  displayed  in  an  f-k  spectrum  plot  computed  by 
equation  (15)  will  be  seen  as  the  actual  f-k  spectrum  multi¬ 
plied  by  the  array  response  function.  A  further  use  of  the 
array  response  is  as  a  criterion  in  optimizing  the  array 
geometry;  however,  in  this  report,  we  will  be  concerned  only 
with  given  array  geometries. 

^ *  C°herence  Functions.  A  useful  measure  of  the  correlation 
in  frequency  or  coherence  of  two  data  charnels  is  the  normal¬ 
ized  magnitude  of  the  cross  spectrum.  This  is  called  an 
ordinary  coherence  function: 


,  |  S .  .  (oj)  j  2 

Y  ?  .  ( oj  )  =  -Li- 3  I — 


ij 


S^(oj)  Sj.(w) 


(18) 


An  ordinary  coherence  function  is  a  result  of  the  smoothing 
applied  to  the  spectra.  This  is  because,  for  no  smoothing: 


S..M 


2 


(oj) 


ft 

F  i  (  oj  ) 


F.  (w) 


and  therefore  in  this  case: 


2 

Y  .  . 
ij 


(oj) 


1 


(19) 


(20) 
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However  when  expected  values  are  taken  the  ordinary  coherence 
takes  on  its  customary  meaning.  The  matrix  of  ordinary  co¬ 
herence  functions  follows  from  the  definition  but  is  usually 
referred  to  as  the  normalized  spectral  matrix. 

Two  other  coherence  functions  are  the  multiple  and 
partial  coherences.  It  is  only  possible  to  define  them  with¬ 
in  the  framework  of  least-squares  filter  theory.  This  is  done 
in  the  appendix.  At  this  point  it  is  sufficient  to  say  that 
the  multiple  coherence  between  a  given  data  channel  and  a 
set  of  other  channels  is  a  measure  of  the  amount  of  power  in 
that  channel  that  can  be  represented  by  a  linear  combination 
of  the  other  channels,  and  that  the  partial  coherence  between 
two  given  data  channels  with  the  effects  of  a  set  of  other 
channels  removed  is  the  ordinary  coherence  of  those  two  chan¬ 
nels  with  the  others  prediction-filtered  out. 

B.  Multichannel  Least-Squares  Filters 

With  the  exception  of  the  maximum-likelihood  filter,  all 
of  the  filters  described  in  this  report  are  solutions  of  the 
Wiener-Hopf  equation.  The  maximum-likelihood  filter  itself 
is  a  solution  of  this  equation  in  a  different  form.  However, 
all  of  these  filters  are  multichannel  filters  in  that  they 
satisfy  the  multichannel  convolution  algorithm: 


y(t)  -  l  l  fi(i)  x • ( t—  t ) 

IT  1 

y(w)  =  £  f^(w)  x^(«) 


(21) 


in  the  time  and  frequency  domains  respectively.  Here  y  is 
the  output,  f^  is  the  ith  filter,  and  x^  is  the  ith  input 
data  channel.  In  this  section  we  will  derive  the  Wiener- 
Hopf  equation  in  both  the  time  and  frequency  domains  and  show 
its  application  to  the  multichannel  filters  under  consider¬ 
ation.  We  will  also  only  be  concerned  with  the  discrete  case 
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primarily  because  of  the  simplification  inherent  in  it. 

^ •  The  Wiener-Hopf  Equation.  The  derivation  we  will  give 
of  the  Wiener-Hopf  equation  is  actually  a  derivation  of  the 
general  wave-shaping  filter.  This  we  will  do  first  for  the 
time-domain  case  by  making  use  of  equation  (21)  to  write  an 
expression  for  the  error  in  energy  between  the  output  of  a 
multichannel  filter  and  a  desired  output  d(t).  This  is: 


E  =  l  [y (t )-d(t) ]2  =  m  f  (T)  X.Ct-T)  U  f.(s)  X.(t-s) 
t  tir  -in  J  J 


~2l  I  f-(T)  X.(t-T)  +  l  d (t ) 2 

t  i  1  1  t 

With  the  substitution  q  =  t-i  and  using  the  definition  of 
correlation,  equation  (3),  the  first  terms  can  be  re-written 


E  =  1111  RijCt-s)  f.(T)  f.(s)  -2 U  Rid(T)  f.(T)  +  l  d(t)2  (23) 
ijts  iT  t 

This  step  also  implies  stationarity ,  equation  (5).  E  is  now 
considered  to  be  a  function  of  the  filter  coefficients  f.(t) 
which  are  to  be  varied  in  order  to  produce  a  minimum  value 
of  E.  This  is  accomplished  by  setting  the  derivatives  of  E 
with  respect  to  the  coefficient  equal  to  zero: 


d  £ 

■5?TTpT  =  0  *  2  U  Rki(P-r)  fi(T>  -2  Kkd'P) 


(24) 


The  result  is  the  normal  equations: 

ll  Rkl(p-T>  f.(T)  =  Rkd 
These,  written  in  matrix  notation,  become: 


R  F  =  R 
xx  xd 


(26) 
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XX 

in  the  form: 


xx 


XX 


XX 


XX 


very 

large  matrix 

composed  of 

smaller  matrices 

(0) 

r  (-1) 

r  (-2) 

r  (-L+1 ) 

XX 

XX 

XX 

(1) 

r  (0) 

r  (-1) 

• 

XX 

XX 

• 

(2) 

r  (1) 

XX 

r  (0) 

XX 

• 

• 

rxx(L-1) 


>\ 


r  (0) 

XX 


) 


(27) 


The  smaller  matrices  are: 


C'xx(s):1k3  =  Rkj(s) 


(28) 


and  hence: 


r  (s)=r  (-s) 

XX  XX 


(29) 


Thus  the  matrix  R  is  actually  symmetric  when  taken  element 
by  element.  The  reason  for  writing  it  in  the  form  of  equation 
(27)  is  that  the  standard  method  of  solving  equation  (26), 
the  Levinson  recursion,  (Levinson,  1949)  takes  advantage  of 
this.  In  particular,  this  form  is  commonly  called  a  block 
Toeplitz  matrix.  In  a  similar  fashion  the  other  two  matrices 
in  equation  (26),  F  and  Rxd,  are  also  written  in  terms  of 
smaller  matrices : 


(30) 
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where  these  smaller  matrices  are: 


Cf (s ) ]±  =  f i  ( s )  CMxd (s ) 3i  =  Rid(s)  (31) 

Now  if  L  is  the  number  of  points  in  each  filter  and 
N  is  the  number  of  channels  then  solving  the  Wiener-Hopf 
equation  in  the  time  domain  means  solving  N*L  simultaneous 
linear  equations: 


F  = 


(32) 


However  the  multichannel  Levinson  recursion  requires  not 
(N • L)  operations  for  the  inversion  of  R  ,  but  only  N3L2 
operations.  This  coupled  with  the  fact  that  only  the  first 
column  of  matrices  in  R  need  be  stored  in  the  machine 
makes  solving  the  problem  a  practical  matter. 

In  the  frequency  domain,  the  errors  at  different  fre¬ 
quencies  are  independent,  so  they  can  be  minimized  separately. 
The  error  is  written: 

E(oj)  =  jy(aj)-d(aj)  |  2  =  £  f^(aj)  .  (u)f .  (w)-|  f  .  ( w)x .  (a))d(aj) 

i  J  J  1  i 

-I  fi(w)xi(w)d  (w)  +  d  (co)d(co)  (33) 

Setting  the  derivations  equal  to  zero  gives : 


l  Ski(w)  fi(u)  =  Skd(w)  (34) 

where  the  spectral  matrix  elements  are: 

Sij(ai)  =  E  (xi(w)  Xj(w)}  (35) 

This  problem  is  a  set  of  N  simultaneous  equations  at  each  fre¬ 
quency.  It  is  therefore  much  easier  to  solve  the  shaping  pro¬ 
blem  in  the  frequency  domain,  since  only  LN3  operations  are 
required. 

Prediction  Filters:  If  the  desired  output  is  one  of  the 
inputs  s  points  ahead  in  time: 


13 


(36) 


d(t)  =  x^(t+s) 

then  the  filter  equations  reduce  to: 

l  J  Rki(P_T)  fi(T)  =  Rk£(T+s)  (37) 

T  1 

and  the  filter  is  called  a  time  prediction  filter.  Similarly, 
if  the  desired  output  is  the  sum  of  the  inputs  at  some  span 
ahead  in  time: 

d(t)  =  I  x.(t+s)  (38) 

h  l 
l 

the  filter  equations  become: 

l  l  Rki(p-r)  fi(t)  =  T  Rki(t+s)  (39) 

t  i  i 

Perhaps  a  more  interesting  quantity  is  the  prediction 
error.  This  is,  for  equation  (36): 

y ( t )  -  x£(t+s)  (40) 

and  for  equation  (38): 

y(t)  -  l  xi(t+s)  (41) 

i 

The  prediction  error  should  be  small  when  no  signal  is  present 
in  the  noise  and  large  when  a  signal,  an  unpredictable  event, 
occurs.  (Claerbout,  1964).  The  prediction  error  filter  is 
therefore  a  possible  detection  device. 

Interpolation  Filters:  If  the  desired  output  is  another 
element  of  the  array  not  included  as  an  input  to  the  filter: 

d(t)  =  x^(t)  (42) 

the  filter  equations  are: 

l  l  Rki(P_T)  fi(T)  =  Rk£(T)  (43) 

t  i 
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This  filter  is  called  a  spatial  prediction  or  interpolation 
filter.  The  interpolation  error  is  much  like  the  predict¬ 
ion  error  in  form  being  defined  by: 


y(t)  -  x^(t) 


(44) 


When  the  interpolation  error  is  small,  much  of  the  power 
appearing  in  the  data  channel  can  be  represented  as  a 
multichannel  convolution  of  the  other  channels.  In  a  certain 


sense,  then,  that  channel  is  redundant. 


An  important  use  of  the  interpolation  filter  and  its 
associated  interpolation  error  is  in  defining  multiple  and 
partial  coherence  functions.  These  are  described  in  detail 
in  the  Appendix. 


Maximum  Likelihood  Filters:  The  maximum  likelihood 
filter,  sometimes  referred  to  as  the  minimum-variance  un¬ 
biased  filter.  Levin  filter,  or  max-like  filter,  (Capon  and 
Greenfield,  1965)  is  the  only  filter  presented  in  this  report 
that  cannot  be  derived  directly  from  the  Wiener-Hopf  equa¬ 
tion.  It  is  still,  however,  a  least-squares  filter.  In  this 
section  we  will  derive  the  normal  equations  for  th«  maximum 
likelihood  filter  in  both  the  time  and  frequency  domains.  It 
should  be  remembered  that  while  both  these  filters  are  math- 
ematicelly  equivalent,  the  time-domain  solution  is  the  truly 
optimum  version.  That  is,  for  a  given  filter  length,  the  time- 
domain  maximum  likelihood  filter  gives  optimum  noise  suppression 
while  still  preserving  the  signal.  The  price  paid  for  this  is 
in  stability  and  the  large  amount  of  computer  time  needed  to 
solve  the  filter  equations.  Nevertheless,  most  recent  work  on 
the  maximum  likelihood  filter  and  its  various  uses  has  center¬ 
ed  on  the  time-domain  version.  (Flinn,  et  al.,  1966). 


The  mathematical  trick  used  to  specify  the  maximum  like¬ 
lihood  filter  equations  is  a  summation  constraint  on  the 
filter  impulse  response.  This  was  first  recognized  by  Levin, 
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(Kelley  and  Levin,  1964).  If  the  data  channels  are  represent¬ 
ed  by: 

x^(t)  =  n1(t)  +  s(t)  (45) 

where  n^t)  are  the  noise  processes  and  s(t)  is  the  signal, 
then  the  multichannel  convolution  algorithm  gives: 

y(t')  =  l  1  x.(t-t)  =  £  J  f  •  (  t  )  n.(t-x)  +  l  S(t-T). 

IT  1  IT11  T 

l  fi(T)  (46) 

l 

Now  when  a  summation  constraint  is  introduced: 

l  fi(t)  =  d(T)  (47) 

i 

the  output  can  be  written: 

y(t)  =  £  J  f-(t)  n.(t-x)  +  J  d (t  )  s(t-r)  (48) 

IT  T 

Clearly  if  d(t)  is  chosen  to  be  a  spike  at  some  time  t  =  p, 
then  the  signal  will  be  reproduced  with  a  delay  p: 

y(t)  =  l  l  f^(t)  n^ ( t-T )  +  s(t-p)  (49) 

1  T 

Among  the  obvious  choices  are  putting  the  spike  at  the 
beginning  of  d(t)  and  putting  it  in  the  middle.  The  result¬ 
ing  filters  are  called  the  realizable  and  symmetric  maximum 
likelihood  filters  respectively.  (Flinn  and  Claerbout,  1965). 
It  has  been  observed  that  the  symmetric  maximum  likelihood 
filter,  because  it  is  "two-sided",  produces  a  noticeable  im¬ 
provement  over  the  realizable  filter.  (Flinn  et  al.,  1966). 

A  two-sided  filter  is  able  to  look  ahead  a  bit  and  thus  im¬ 
prove  its  performance.  Since  the  signal  part  of  the  output 
signal  convolved  with  d(t),  some  bandpass  filtering  of  the 
signal  can  be  done  without  any  extra  effort.  In  this  case 
d(t)  would  be  the  impulse  response  of  the  desired  filter. 

The  filter  equations  follow  directly  from  the  least 
squares  condition  modified  by  the  constraint.  The  proper 
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mathematical  technique  to  use  here  is  the  method  of  Lagrange 
multipliers.  We  want  to  minimize  the  energy: 

E  =  1  y<t)  =  l  l  l  f.(t)  x.(t-T)  l  l  f.(s)  x.  (t  -  s) 
t  t  i  t  1  1  j  s  2  D 

=  1111  Ri-i(T  -  s>  f-rCt)  f.(s)  (50) 

i  j  T  s  J  ±  -J 

subject  to  the  constraint: 

1  =  d(t)  (51) 

The  proper  quantity  to  minimize  is  then: 

E=E  +  l  X(T)[  I  f.(T)-d(T)]  =  HU  R  .  .  ( T-S )  f  .  ( T )  f.(s) 

T  i  i  j  TS  13  1  3 

+  l  X(t)[J  f. (t)-d(t)]  (52) 

T  i  1 

Where  X  (t)  is  now  a  new  variable  called  the  Lagrange  multi¬ 
plier.  The  derivatives  with  respect  to  fk(p)  give: 

9  E 

3f  (p)  =  0  =  2  11  Rki(p“T)fi(x)  +  X(p)  (53) 

k  i  t 

or,  by  absorbing  the  extra  factor  of  -2  into  A(p)  and  usin^ 
the  matrix  notation: 


ExxF  = 

where 

CX]p  =  X(p) 


(54) 


Another  equation  is  obtained  by  taking  derivatives  with  re¬ 
spect  to  X (p ) : 

l  fi(p)  =  d(p)  (55) 

This  can  be  put  into  matrix  form  by  defining  a  new  matrix  G 
composed  of  smaller  matrices  g(t): 
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G 


(56) 


fg(0)l 
g(l) 

An  example  of  G  for  N 

0 


G  = 


n 


0 

0 


v 


1 

1 


Cg(T)]ij  = 


2  and  L  =  3  is: 
o' 

0 


■\ 


0 

0 


(57) 


With  this  definition  of  G,  the  second  equation  can  be  written: 


G  F  =  D  or  [Djp  =  d(p) 


(58) 


The  two  equations  which  specify  the  maximum  likelihood  filter 
are  then: 


V  =  GX 


T 

GF  =  D 


(59) 


These  can  be  solved  by  some  matrix  manipulation  to  give: 

F  =  R-1G  (GTR-1G)-1D  (60) 

In  the  frequency  domain  it  can  be  shown  that  the  maximum 
likelihood  filter  is  defined  by  a  similar  formula.  We  will 
not  actually  derive  it  but  will  show  that  it  corresponds  very 
closely  to  the  result  obtained  above.  The  correlation  matrix 
becomes  the  spectral  matrix.  The  matrix  G  is  a  similar  sum¬ 
mation  matrix  and  the  constraint  vector  reduces  to  a  constant, 
usually  chosen  to  be  1. 


f  (a>)  =  i  s:t  (o»  /  i  i  s:j  (a)) 

1  j  13  ij  1] 


(61) 


S  (a>)  is  taken  here  to  be  the  inverse  of  the  spectral  matrix. 
The  chief  computational  advantage  of  solving  the  problem  in 
frequency  is  that  it  can  be  done  frequency  by  frequency.  Only 
the  inversion  of  each  Hermitian  N-by-N  spectral  matrix  is  re¬ 
quired. 


Isotropic  Processors.  A  further  assumption  usually  made 
in  least-squares  filter  theory  which  simplifies  the  resulting 
equations  is  that  the  signal  and  noise  are  uncorrelated  or  in¬ 
coherent.  Now j  if  the  inputs  into  equation  (34)  are  taken 


to  be : 


xi(o>)  =  N±  (clj  )  +  Qi(oi) 


(62) 


this  assumption  reduces  equation  (34)  to: 


I  tNki(a,)  +  Qki<w)]  f.(«)  =  Skd(a>) 


(63) 


The  cross  term  cancels  out.  Furthermore  if  the  desired  out¬ 
put  is  chosen  to  be  some  sort  of  signal  the  equation  is: 


I  CNk3.(u)  +  Qki(w)3  f.(cu)  =  Qkd(w) 


(64) 


where  N  is  the  spectral  matrix  of  the  noise  and  Q  is  the 
spectral  matrix  of  the  signal.  It  has  been  shown  by  Burg, 
(Burg,  1964;  Spieker,  et  al.,  1961)  that  good  filters  can  be 
designed  from  equation  (64)  using  an  analytical  signal  spectral 
matrix  and  either  an  analytical  or  a  measured-noise  spectral 
matrix.  These  filters  we  will  call  the  theoretical  and  measur¬ 
ed  noise  isotropic  processors . 


For  analytical  models,  we  will  calculate  what  the  spectral 
matrix  elements  are  for  given  velocity  distributions.  In  all 
the  following  cases  the  velocity  distribution  is  assumed  to 
be  independent  of  frequency.  The  first  model  is  the  point 
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model.  Data  traveling  with  a  given  vector  velocity  v  has 
spectral  matrix  elements  of  the  form: 


S..(f)  =  P(f)  expf^ifv  .  (x.-x.)l 
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(65) 


Here  P(f)  is  the  power  spectrum  which  is  assumed  to  be  the 
same  for  all  channels .  The  ring  model  for  data  with  a  given 
scalar  velocity  v  is  obtained  by  integrating  equation  (65): 


1  f27r 

J  expC2Tri|k|  |Xi-x.  |cose]  kd0  =  Jo(2ir|Jc|  |x.-x.|)  (66) 


The  ring  model  therefore  has  spectra  of  the  form: 


Si.(f)  =  P(f)  Jo 


(67) 


Two  other  models  can  be  derived  by  integrating  equation  (65) 
over  certain  regions  in  K  space.  These  are  the  disc  model, 
v  >  vs>  and  the  annular  ring  model,  vL  <  v  <  vH.  The  spectra 


for  them  are; 


S,.(f)  = 


P(f)v 
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There  are  two  other  spectral  matrices  which  are  useful  to  de¬ 
fine.  These  are  the  infinite  velocity  and  incoherent  matrices. 
Their  elements  are: 
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* 

* 


(70) 


S^Cf)  =  P(f) 

S±j (f )  =  P(f)  6 j  (71) 

The  scheme  therefore  is  to  insert  combinations  of  these 
models  into  equation  (64).  As  a  simplification,  the  desired 
output  is  chosen  to  be  the  signal  which  would  be  seen  at  the 
origin  of  the  array,  x^  =  0.  A  parameter  is  also  inserted 
to  allow  varying  the  signal  to  noise  ratio  of  the  filters. 
With  these  assumptions  equation  64  is: 

l  [Nki(f)  +  c  Qu.  (f ) ]  f±(f)  =  Qko(f).  (72) 

The  theoretical  isotropic  processor  uses  an  annular  ring 
model  for  the  noise  and  either  a  disc  or  an  infinite  velocity 
signal  model.  P(f)  is  chosen  to  be  1  in  this  case.  For  the 
measured  noise  isotropic  processor,  an  actual  spectral  matrix 
computed  from  a  sample  of  noise  is  used  with  the  same  choice 
of  signal  models.  However  here,  P(f)  is  chosen  to  be  a  cer¬ 
tain  noise  power  spectrum. 

One  further  modification  of  the  signal  models  is  neces¬ 
sary  to  make  the  resulting  filter  response  functions  stable. 
In  actual  practice  the  relative  magnitudes  of  the  input  data 
channels,  or  gain  corrections,  are  not  accurately  known. 

The  least-squares  equations  can  taJce  advantage  of  this  fact 
and  give  solutions  which  do  not  in  general  reduce  the  noise 
level.  Another  way  to  see  this  is  that  the  signal  matrix 
Q  is  nearly  singular.  It  specifies  the  signal  model  too  ac¬ 
curately.  Consequently  some  relaxation  of  the  signal  model 
is  necessary  to  stabilize  the  computation.  The  most  natural 
way  to  do  this  is  to  add  a  few  percent  white  noise  to  the 
model.  A  good  figure  is  three  percent,  which  can  be  shown 
to  allow  an  even  distribution  of  gain  corrections  between 
.7  and  1.3.  Our  programs  are  designed  to  do  this. 

RESULTS 

The  results  of  this  study  are  presented  in  Table  3  ,and 
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in  Figures  3  through  10.  The  first  three  figures  show  char¬ 
acteristics  of  the  array  and  the  data;  the  rest  of  the 
figures  show  multichannel  filter  outputs  and  evaluations. 
Table  3  lists  the  actual  S/N  improvement  obtained  by  each 
method.  Viewed  together,  these  results  represent  a  large 
amount  of  information.  This  section  will  explain  each  re¬ 
sult  and  also  discuss  the  various  comparisons  which  can  be 
made . 

The  array  response  function,  contour-plotted  in  Figure 
3,  demonstrates  the  basic  capability  of  the  array.  To  some 
degree  it  is  reflected  in  all  of  multichannel  filter  f-k 
response  plots  that  follow.  This  response  has  a  reasonably 
small  central  lobe  but  also  six  prominent  side  lobes.  The 
number  of  side  lobes  reflects  the  somewhat  triangular  nature 
of  the  array  and  their  amplitude  is  a  direct  result  of  only 
using  ten  sensors.  The  sharp  central  lobe  comes  from  using 
well-spaced  sensors.  Adding  more  sensors  would  reduce  the 
side  lobes  by  a  gradual  averaging  process  but  would  not 
appreciably  increase  the  wavenumber  resolution  of  the  array. 
Generally  speaking,  the  expected  result  of  beamforming  this 
array  would  be  good  signal  enhancement  at  high  frequencies 
but  mediocre  noise  rejection;  since  noise  will  leak  in 
though  the  side  lobes. 

Plots  of  the  actual  data  are  shown  in  Figures  4  and  5. 
Figure  4  is  the  raw  data  and  Figure  5  is  the  same  data  band¬ 
pass  filtered.  The  incident  P  wave  used  as  the  signal  can 
be  seen  towards  the  end  of  both  seismograms.  The  amount  of 
noise  in  front  of  the  signal  in  both  figures  is  not  re¬ 
presentative  of  all  there  was  on  the  tape.  In  actuality, 
approximately  200  seconds  (4096  points  at  20  samples  per 
second)  of  noise  were  available  to  the  programs. 

The  noise  f-k  spectra  shown  for  four  frequencies  in 
Figure  8  were  computed  using  all  200  seconds  of  noise.  The 
raw  spectra  in  the  spectral  matrix  were  smoothed  and  deci¬ 
mated.  five  times  from  4097  to  129  points  with  our  triangular 


convolution  and  decimation  procedure.  This  gave  32  degrees 
of  freedom  to  each  frequency  estimate  and  a  resolution  of 
0.078  cps . 

The  most  salient  feature  of  those  four  plots  is  the 
presence  of  high-velocity  seismic  noise.  Any  of  the  multi¬ 
channel  filters  computed  from  a  measured  noise  sample  should 
experience  difficulty  in  enhancing  a  high  velocity  signal 
and  rejecting  high  velocity  noise.  A  general  characteristic 
of  these  noise  spectra  is  the  influence  of  the  array  response. 
This  effect  is  most  clearly  seen  in  the  0.39  cps  plot  where 
the  area  enclosed  by  the  -3  db  contour  is  almost  a  third  of 
the  whole  plot.  At  high  frequencies  the  effect  of  the  array 
response  is  smaller  but  above  the  1.25  cps  the  noise  field 
is  almost  flat  and  therefore  uninteresting.  The  meaning  of 
all  this  is  evident:  this  array  is  just  not  suitable  for 
investigating  low-frequency  noise. 

Figures  6  and  7  show  the  output  of  various  multichannel 
filters  as  well  as  the  straight  sums  for  un-pref iltered  and 
prefiltered  data  respectfully.  The  fitting  interval  for  the 
maximum-likelihood  filter  was  150  seconds,  ending  just  short 
of  the  signal  arrival  time.  The  fitting  interval  for  the 
measured-noise  isotropic  processors  was  approximately  100 
seconds  (2048  points  at  20  samples  per  second),  ending  be¬ 
fore  the  origin  time  of  the  plot.  The  theoretical  isotropic 
processor,  of  course,  is  not  computed  from  any  actual  data. 

The  maximum  likelihood  filter  was  thirty  points  long 
sampled  at  10  samples  per  second  giving  it  an  effective  length 
of  three  seconds.  It  required  31  minutes  of  CDC  1604B  time 
to  calculate,  16  minutes  for  the  correlation  matrix  and  15 
minutes  for  the  filter.  The  theoretical  and  measured-noise 
isotropic  processors  were  255  points  long,  sampled  at  20 
samples  per  second,  giving  them  effective  length  of  almost 
thirteen  seconds.  The  measured-noise  isotropic  processors 
required  35  minutes  of  computer  time  to  calculate,  20  minutes 
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for  the  spectral  matrix  and  15  minutes  for  the  filter.  The 
theoretical  isotropic  processors  required  12  minutes  of  com¬ 
puter  time  10  calculate . 

Table  3  lists  the  actual  improvement  in  S/N  ratio  over 
the  un-prefiltered  straight  sum  for  the  various  multichannel 
filters.  Simply  bandpass  filtering  the  sum  gets  15  db  im¬ 
provement  in  S/N  ratio  at  practically  no  expense  of  computer 
time,  so  the  real  comparison  is  with  this  reference  level. 
The  amplitude  of  the  noise  was  computed  for  the  measured 
noise  multichannel  filter  outputs  over  an  interval  of  200 
seconds,  including  the  fitting  interval  and  100  seconds  out¬ 
side  the  fitting  interval.  The  maximum- likelihood  noise 
amplitude  was  also  computed  over  an  interval  of  200  seconds, 
but  this  time  including  the  fitting  interval  and  50  seconds 
outside  it.  Therefore,  these  noise  levels  represent  an 
average  of  the  rms  noise  amplitude  inside  fitting  interval 
and  outside  it.  This  problem  does  not,  of  course,  arise 
with  the  straight  sums  and  theoretical  isotropic  processor 
outputs . 

We  can  make  several  comparisons  in  this  table.  The 
first  concerns  the  amount  of  frequency  filtering  that  is 
done.  The  illusory  gains  from  the  maximum-likelihood  filter 
operating  on  un-prefiltered  noise  come  about  because  the 
least-squares  equation  does  not  constrain  the  filter  to  do 
otherwise.  Many  degrees  of  freedom  in  ,he  filter  weights 
are  misused  doing  what  could  be  done  with  just  elementary 
processing,  i.e.,  bandpass  filtering.  Apparently  all  but 
2  db  of  the  original  15  db  can  be  accounted  for  by  bandpass 
filtering.  On  the  other  hand,  the  design  equation  for  the 
measured-noise  multichannel  filter  specifies  as  a  desired 
signal  spectrum  one  of  the  noise  spectra.  Therefore,  some 
frequency  filtering  is  expected.  The  design  equation  for 
the  theoretical  multichannel  filter  assumes  a  flat  spectrum 
for  the  desired  signal  output.  However,  the  filter  which 
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satisfies  this  equation  is  only  the  optimum  filter  in  the 
least-squares  sense,  so  we  should  expect  some  but  not  much 
frequency  filtering.  This  is  demonstrated  in  the  table. 

In  summary  then  the  maximum  likelihood  multichannel  filter 
is  the  worst  regarding  frequency  filtering  and  the  theo¬ 
retical  multichannel  filter  is  the  best. 

Secondly,  in  both  cases,  the  theoretical  and  the  mea¬ 
sured  noise  filters  designed  with  a  disc  signal  mode]  (V  >_ 

10  km/sec)  performed  better  than  the  infinite-velocity  filters. 
This  fact  is  true  in  general  and  is  a  direct  result  of  not 
beamsteering  the  array.  However,  other  results  discussed 
below  suggest  that  this  gain  occurs  only  in  a  frequency  band 
higher  than  the  signal  band.  For  the  prefiltered  theoretical 
multichannel  filter,  designed  on  noise  where  this  energy  has 
been  removed,  this  difference  only  amounts  to  0.3  db. 

Finally,  as  regards  isotropic  processor  multichannel 
filters,  the  theoretical  processor  is  always  better  than 
measured-noise  processor.  This  is  a  strange  and  completely 
unexpected  result,  probably  arising  from  the  fact  that  the 
noise  seems  to  overlap  the  signal  model  in  k  space,  as  Fig¬ 
ure  8  shows.  For  a  measured-noise  multichannel  filter  this 
represents  a  contradiction  of  specification  in  the  least- 
squares  design  equations  which  could  lend  again  to  a  misuse 
of  the  degrees  of  freedom  available  in  the  filter  weights. 

The  theoretical  multichannel  filter  is  not  computed  from 
actual  data  and  hence  is  not  affected. 

Figure  9  shows,  for  two  frequencies,  the  f-k  transfer 
functions  of  each  multichannel  filter.  In  this  figure  alone 
the  direction  of  the  k  vectc r  is  opposite  to  that  of  our 
original  definition  and  the  definition  in  the  other  figures. 
Here  the  k  vector  points  in  the  direction  toward  which  an 
equivalent  wave  front  would  be  moving.  These  plots  show 
clearly  how  for  both  isotropic  processors  the  infinite- 
velocity  signal  model  provides  a  sharper  signal  band  at  1.25 


cps.  The  passband  of  the  theoretical  multichannel  filters, 

2  to  4  km/sec.,  is  also  evident  in  these  plots.  Although 
comparison  is  difficult  because  of  the  difference  in  the 
direction  of  the  k  vector,  it  is  possible  to  see  that  some 
of  the  noise  peaks  in  Figure  8  are  nulled  by  the  measured- 
noise  multichannel  filter  f-k  transfer  function.  Unfortu¬ 
nately  not  all  the  noise  is  reduced.  It  is  also  possible 
to  see  small  differences  in  the  k-space  passbands  due  to 
changes  from  un-prefiltered  to  prefiltered  noise  in  the 
pertinent  multichannel  filter  fitting  intervals.  Perhaps 
nulling  noise  by  a  strong  bandpass  filter  actually  destroys 
some  of  its  spatial  information.  In  passing,  we  note  that 
another  deficiency  of  the  maximum-likelihood  filter  can  be 
seen  here,  for  more  than  any  other  multichannel  filter  the 
maximum-likelihood  multichannel  filter  produces  a  compli¬ 
cated  and  generally  strange  k-space  pattern.  It  may  be  that 
many  degrees  of  freedom  are  lost  in  generating  this  unneces¬ 
sarily  sharp  response. 

Figure  10  shows  plots  of  the  performance  factor  of  each 
multichannel  filter  (in  terms  of  S/N  ratio  improvement)  to 
a  perfectly  correlated  signal  in  the  presence  of  completely 
uncorrelated  noise.  The  performance  factor  is  defined  by: 

IJ 

P<»>  =  — - 2  (73) 

I  | f ±Cw) | 

i 

These  plots  confirm  the  earlier  conjecture  that  the  infinite- 
velocity  signal  model  isotropic  processors  work  best  on  noise 
hxgher  than  1  cps.  This  is  most  dramatic  in  the  case  of  the 
theoretical  multichannel  filter  where  the  performance  tends 
asymtotically  to  1.  Here  again  is  illustrated  the  general 
strangeness  of  the  maximum-likelihood  multichannel  filter; 
whose  performance  factors  stand  out  as  unlike  the  others. 
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One  particular  difficulty  of  the  maximum-likelihood  multi¬ 
channel  filter  is  its  poor  performance  at  low  frequencies; 
which  is  most  likely  due  to  its  relatively  short  impulse 
response  (three  seconds  as  compared  to  thirteen  seconds  for 
the  other  filters). 

CONCLUSIONS 

The  conclusions  listed  below  are  based  upon  results 
obtained  from  processing  only  one  event.  In  view  of  this 
they  should  be  accepted  with  due  caution. 

1.  Designing  multichannel  filters  on  un-pref iltered 
noise  gives  an  illusory  S/N  ratio  improvement  which  is  not 
attained  with  similar  multichannel  filters  designed  on 
pref iltered  noise. 

2.  Isotropic  processors  can  be  effectively  applied  to 
un-beamsteered  arrays,  provided  a  finite  velocity  signal 
model  is  used. 

3.  Measured-noise  isotropic  processors  are  not  effect¬ 
ive  in  improving  S/N  ratio  when  the  noise  has  large  high- 
velocity  components.  'In  this  case  the  theoretical  multi¬ 
channel  filter  is  the  superior  processor. 

4.  With  the  exception  of  the  maximum-likelihood  filter 
evaluated  for  performance  within  its  fitting  interval,  only 
the  theoretical  multichannel  filter  showed  a  gain  over  the 
prefiltered  straight  sum.  That  gain  was  slightly  more  than 
1  db,  which  is  not  regarded  as  significant. 

5.  One  drawback  of  the  maximum-likelihood  filter  is 
its  unnecessarily  complex  response  in  k-space.  It  does  not 
appear  that  modifying  the  constraint  will  cure  this. 
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Multiple  and  Partial  Coherence  Functions.  The  multiple 
coherence  function  of  a  data  channel  is  a  measure  of  how 
much  power  on  that  channel  can  be  interpolated  from  a  set 
of  other  channels.  (Bendat  and  Piersol,  1966).  Imagine 
an  array  configuration  of  four  sensors  at  the  corners  of 
a  square: 


A 

x. 


X. 


The  interpolation  1 liter  equations  to  interpolate  channel 
A  from  channels  x2,  and  x3  are: 


sxx(u^  =  sxA<w) 


or  written  out 
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is 

now  unders 
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(75) 


interpolated  output  A  is  then: 


A  =  +  x2F2  +  x3F3  (76) 

from  the  multichannel  convolution  algorithm.  If  the  filter 
F  is  able  to  completely  interpolate  A,  in  the  least-squares 
sense,  then  the  error: 

A 

A  -  A  (77) 

will  be  incoherent  with  A.  That  is: 

E  {A(w)[A(w)  -  A(w)]  }  =  0  (78) 
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which  is: 


SM  *  SM 


Now  because  of  the  convolution  algorithm  this  can  be  written: 

SAA  =  SA1F1  +  SA2F2  *  SA3F3  <79) 

Of  course  this  is  still  only  if  the  filter  can  account  for  all 
of  the  predictable  power  in  A.  A  set  of  equations  that  com¬ 
bines  both  this  constraint  and  the  filter  equations  themselves 
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expansion  of  equation  (80)  along  the  indicated  partition 
gives  both  equations  (75)  and  (79).  In  matrix  form  this 
equation  is: 

S'  F'  =  0  (81) 

where  ..s  now  defined  to  be  the  augmented  spectral  matrix. 
For  a  non-trivial  solution,  the  augmented  spectral  matrix 
must  be  singular.  An  expression  for  its  determinant  is: 


SA | x  Z  SAA  _(SA1  SA2  SA3)S 


det  S' 
det  S 


(82) 


The  multiple  coherence  function  is  a  relative  measure  of 
this  determinant  or  equivalently,  of  the  solubility  of  these 
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equations : 


det 

(det  S)SM 


(83) 


When  the  condition  (81)  and  hence  the  conditions  (75)  and 
9  ,  x.  true,  det  S'  =  0  and  the  multiple  coherence  is  1. 
When  less  and  less  power  on  channel  A  can  be  interpolated 
from  the  other  channels ,  the  multiple  coherence  tends  toward 


The  partial  coherence  function  (Bendat  and  Piersol,  1966) 

is  the  oridnary  coherence  between  two  interpolation  errors. 

Imagine  an  array  configuration  with  elements  the  corners  of 
a  square. 


The  filter  equations  to 
are : 


interpolate  A  and  B  from 


and  x2 


which  are  in  matrix  notation: 


S  F  =  G 

The  interpolated  outputs  are  now 
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The  two  errors  in  interpolation  are 


The  spectral  matrix  of  these  two  errors,  which  is  frequently 
called  the  conditioned  spectral  matrix,  is: 


=  EE+  = 

1 a\  /aN 

A*  A*  a  a 

(A  B  )  -  (A  B  ) 

1  A  1  "  I 

\B/  \bJ 

where  of  course  expected  values  are  to  be  taken.  The  trans 
pose  occurs  on  the  left  hand  side  of  equation  (89)  because 
this  definition  of  spectra  is  the  complex  conjugate  of  our 
previous  definition,  equation  (19). 
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The  partial  coherence  between  channels  A  and 
nary  coherence  of  this  2-by-2  matrix: 

2  Sc(l,2)  S  c  C  2 , 1 ) 

a,b|x1,x2  s^TITTr^TTr^T 


B  is  the  ordi- 


(91) 
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TABLE  I 


MULTICHANNEL  FILTER  REPORT 
Location  of  Event 


10  November  1965 
Aleutian  Islands 
Magnitude*:  4.1 

Origin  Time*:  03:48:78.0  GMT 
Latitude*:  51.2°  N 

Longitude*:  178.8°  W 

Focal  Depth*:  "33  km" 


Epicentral  Distance  to  LASA  A010:  46.1°  (5126  km) 

Azimuth  of  Epicenter  from  LASA  A010:  304° 

Surface  Velocity  at  LASA  A010:  14  km/sec 


*Data  from  USC€S  Preliminary  Determination  of  Epicenter 
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TABLE  2 


MULTICHANNEL  FILTER  REPORT 


Array  Configuration 


Channel  Channel  X  Coordinate  Y  Coordinate  Demagnif ication 


(km) 


0 . 000 


-0.716 


0.745 


1.610 


3  .326 


0.204 


0.409 


-2.609 


-0.950 


-1.900 


(km) 


0.000 


3.426 


0.667 


1.440 


-1.093 


-0.979 


■1.958 


-2.3: 


(counts/lO-^ 
microns ) 


TABLE  3 


MULTICHANNEL  FILTER  REPORT 


S/N  Ratio  Improvement 

Filter 

Data 

Signal  Model 

db  Improvement 

Unphased  Sum 

UBPF 

-- 

0 

Theoretical  MCF 

UBPF 

10  km/ sec 

7 

Theoretical  MCF 

UBPF 

Infinite 

4.8 

Maximum  Likeli¬ 
hood 

UBPF 

— 

15 

Measured-Noise 

MCF 

UBPF 

10  km/sec 

3 

Measured- Noise 

MCF 

UBPF 

Infinite 

-6 

Unphased  Sum 

BPF 

— 

15 

Theoretical  MCF 

BPF 

10  km/sec 

16.4 

Theoretical  MCF 

BPF 

Infinite 

16.1 

Maximum  Likeli¬ 
hood 

BPF 

— 

17 

Measured-Noise 

MCF 

BPF 

10  km/ sec 

10.3 

Measured-Noise 

MCF 

BPF 

Infinite 
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IS  ABSTRACT 

The  theory  of  optimum  least-square?  multichannel  filters 
is  developed  from  the  definitions  of  correlation  functions  and 
power  spectra.  Four  types  of  these  optimum  processors  were 
applied  to  an  Aleutian  earthquake  recorded  by  part  of  the 
Montana  LASA  and  the  results  evaluated.  It  appears  that  certain 
isotropic  processors  can  give  a  modest  signal-to-noise  ratio 
improvement  in  excess  of  1  db  over  the  prefiltered  straight  sum. 
Evaluation  of  the  outputs  and  transfer  functions  of  these  filters 
gives  some  insight  into  methods  for  improving  their  gain  and 
stability.  Several  other  topics  are  discussed  from  a  theoretical 
point  of  view;  these  include  estimation  of  power  spectral  density 
functions,  frequency-wavenumber  spectra,  and  certain  coherence 
functions.  - - 
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